library(ggplot2)
library(rjson)
library(rvest)
## Loading required package: xml2
library(stringr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Import csv

I have converted the json file into csv for later data analysis.

I mutate the“positionTS” column to get rid of unecessary information.

df<-
  df %>% 
  mutate(positionTS=str_replace(positionTS,'"positionTS":',"")) %>% 
  mutate(positionTS=str_replace(positionTS,',',""))

Converting original timestamp data into more readable format.

z<-as.numeric(df[,"positionTS"])
z<-as.POSIXct((z+0.1)/1000, origin = "1970-01-01")
df[,"positionTS"]<-z

Clean column names

df<-
  df %>% 
 janitor::clean_names()

The records of this dataset were collected between“2017-01-24 10:49:15 EST” and “2017-01-24 11:08:16 EST”

range(df$position_ts)
## [1] "2017-01-24 10:49:15 EST" "2017-01-24 11:08:16 EST"

Does every tag being detected at same frequency? Are the number of detections for each tag equivalent to each other?

Does every tag being detected during the same period of time? When?

tbl_freq_detected=df %>% 
  group_by(id,name) %>% 
  summarize(n=n())
tbl_freq_detected
## # A tibble: 26 x 3
## # Groups:   id [?]
##    id           name        n
##    <fct>        <fct>   <int>
##  1 b4994c876dbb 024 res   172
##  2 b4994c876dcb 002 dev  5417
##  3 b4994c876de6 019 res  5439
##  4 b4994c877897 003 dev  5359
##  5 b4994c877aa1 005 dev  5456
##  6 b4994c877cb8 006 dev  5508
##  7 b4994c877d82 021 res   295
##  8 b4994c877eca 007 dev  3874
##  9 b4994c877ee8 012 pub  5468
## 10 b4994c877fa2 017 res  5474
## # ... with 16 more rows

According to table shown above,most tags has been detected morethan 5000 times whereas a small amount of tags were detected less than 300 times.

Does it mean that some tags were detected in shorter period? Let’s check out the timeslot between first detection and last detection for each period.

tbl_interval_detection=df %>% 
  group_by(id,name) %>% 
  summarize(start=min(position_ts),end=max(position_ts))

tbl_interval_detection
## # A tibble: 26 x 4
## # Groups:   id [?]
##    id           name    start               end                
##    <fct>        <fct>   <dttm>              <dttm>             
##  1 b4994c876dbb 024 res 2017-01-24 10:50:11 2017-01-24 11:08:10
##  2 b4994c876dcb 002 dev 2017-01-24 10:49:19 2017-01-24 11:08:16
##  3 b4994c876de6 019 res 2017-01-24 10:49:19 2017-01-24 11:08:16
##  4 b4994c877897 003 dev 2017-01-24 10:49:19 2017-01-24 11:08:16
##  5 b4994c877aa1 005 dev 2017-01-24 10:49:19 2017-01-24 11:08:16
##  6 b4994c877cb8 006 dev 2017-01-24 10:49:19 2017-01-24 11:08:16
##  7 b4994c877d82 021 res 2017-01-24 10:49:15 2017-01-24 11:08:08
##  8 b4994c877eca 007 dev 2017-01-24 10:49:19 2017-01-24 11:08:16
##  9 b4994c877ee8 012 pub 2017-01-24 10:49:19 2017-01-24 11:08:16
## 10 b4994c877fa2 017 res 2017-01-24 10:49:19 2017-01-24 11:08:16
## # ... with 16 more rows

According to the table ‘tbl_interval_detection’, we found that the interval for each tag is about the same.Starting from around 10:49AM and ends around 11:08AM.

Maybe tags are detected at different frequencies?

Let’s plot ID vs. Timestamp for each ID tag to get a bigger picture.

Plotly View enable us to determine the exact timestamp of ID tag in our graph.

ggplot_timestamp=df %>% 
  ggplot(aes(x = id, y = position_ts, ymin = min(position_ts), 
             ymax = max(position_ts))) +
    geom_point(size=0.08) +
    theme(axis.text.x = element_text(angle = 90,size = 8)) +
  theme(legend.position = "none")+
  labs(x="ID",y="Timestamp",title="ID tag v.s Timestamp") 

ggplotly(ggplot_timestamp)

###According the the “ggplot_timestamp”, we found that several tags were not detected as frequently as most of others.

What about location plots for each tag?

Since all values in smooth_position_003(z-coordinate) are zero, I plot 2D diagram instead of 3D

By clicking id on the right, we can observe the x and y positions for specific ID independently.

library(plotly)
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ tibble  1.4.2     ✔ readr   1.1.1
## ✔ tidyr   0.8.1     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ plotly::filter()        masks dplyr::filter(), stats::filter()
## ✖ readr::guess_encoding() masks rvest::guess_encoding()
## ✖ dplyr::lag()            masks stats::lag()
## ✖ purrr::pluck()          masks rvest::pluck()
library(viridis)
## Loading required package: viridisLite
scatter_position=df %>%
  ggplot(aes(x = smoothed_position_001, y = smoothed_position_002, color = id)) +
  geom_point(alpha = 0.15) +
  theme_classic()

ggplotly(scatter_position)
df%>% 
  ggplot(aes(x =position_ts,y=position_accuracy,color=id)) + 
  geom_point()+
  geom_line()

Mean of PositionAccuracy By Tag ID (26 Distinct)

position_accuracy=df %>% 
  group_by(id,name) %>% 
  summarize(avg_accuracy= mean(position_accuracy))

position_accuracy %>% 
  ggplot(aes(x=name,y=avg_accuracy,color=id))+
  geom_point()

There are 4 different colored tags.

df %>% 
  group_by(color) %>% 
  summarize(avg= mean(position_accuracy)) %>% 
  View()